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Abstract 

The classical alternating minimization (or projection) algorithm has been successful in the context of solving 
optimization problems over two variables. The iterative nature and simplicity of the algorithm has led to its 
application in many areas such as signal processing, information theory, control, and finance. 

A general set of sufficient conditions for the convergence and correctness of the algorithm are known when the 
underlying problem parameters are fixed. In many practical situations, however, the underlying problem parameters 
are changing over time, and the use of an adaptive algorithm is more appropriate. In this paper, we study such 

■ an adaptive version of the alternating minimization algorithm. More precisely, we consider the impact of having a 
| slowly time-varying domain over which the minimization takes place. As a main result of this paper, we provide a 

general set of sufficient conditions for the convergence and correctness of the adaptive algorithm. Perhaps somewhat 
| surprisingly, these conditions seem to be the minimal ones one would expect in such an adaptive setting. We 

\q ■ present applications of our results to adaptive decomposition of mixtures, adaptive log-optimal portfolio selection, 

t— I | and adaptive filter design. 

■ I. Introduction 
^ ■ A. Background 

Solving an optimization problem over two variables in a product space is central to many applications 
CN 1 in areas such as signal processing, information theory, statistics, control, and finance. The alternating 
^ . minimization or projection algorithm has been extensively used in such applications due to its iterative 
■^j- ! nature and simplicity. 

The alternating minimization algorithm attempts to solve a minimization problem of the following form: 
^ ■ given V, Q and a function D : V x Q — > K., minimize D over V x Q. That is, find 

O : min D(P,Q). 

O ■ Often minimizing over both variables simultaneously is not straightforward. However, minimizing with 

>• ■ respect to one variable while keeping the other one fixed is often easy and sometimes possible analytically. 

^ , In such a situation, the alternating minimization algorithm described next is well suited: start with an 

H " arbitrary initial point Q E Q; for n > 1, iteratively compute 

P n E arg minL)(P,C} n _i), 

P£V (1) 
Q n E arg mm D(P n ,Q). 

QeQ 

In other words, instead of solving the original minimization problem over two variables, the alternating 
minimization algorithm solves a sequence of minimization problems over only one variable. If the 
algorithm converges, the converged value is returned as the solution to the original problem. Conditions 
for the convergence and correctness of such an algorithm, that is, conditions under which 

lim D(P n , Q n ) = min D(P,Q), (2) 

n^oo (P,Q)£TxQ 
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have been of interest since the early 1950s. A general set of conditions, stated in the paper by Csiszar 
and Tusnady [1, Theorem 2], is summarized in the next theoremQ 

Theorem 1. Let V and Q be any two sets, and let D : V x Q — > R such that for all P G V, Q G Q 

arg mm D(P,Q) ^ 0, 
PeV 

arg mm D(P,Q) ^ 0. 

QeQ 

Then the alternating minimization algorithm converges, i.e., (|2]) holds, if there exists a nonnegative function 
5 : V x V — > M + such that the following two properties hold: 

(a) Three point property (P, P, Q): For all P G V, Q G Q, P G arg min D(P, Q), 

5(P,P)+D(P,Q)<D(P,Q). 

(b) Four point property (P, Q, P, Q): For all P,P G V, Q G Q, Q G arg minD(P, Q), 

D(P,Q)<D{P,Q)+S(P,P). 

B. Our Contribution 

In this paper, we consider an adaptive version of the above minimization problem. As before, suppose 
we wish to find 

min D(P,Q) 

{P,Q)eVxQ 

by means of an alternating minimization algorithm. However, on the nth iteration of the algorithm, we 
are provided with sets V n , Q n which are time-varying versions of the sets V and Q, respectively. That is, 
we are given a sequence of optimization problems 

{ min D(P,Q)\ . (3) 

^ (P,Q)£V n xQ n J n>0 

Such situations arise naturally in many applications. For example, in adaptive signal processing problems, 
the changing parameters could be caused by a slowly time-varying system, with the index n representing 
time. An obvious approach is to solve each of the problems in © independently (one at each time instance 
n). However, since the system varies only slowly with time, such an approach is likely to result in a lot 
of redundant computation. Indeed, it is likely that a solution to the problem at time instance n — 1 will be 
very close to the one at time instance n. A different approach is to use an adaptive algorithm instead. Such 
an adaptive algorithm should be computationally efficient: given the tentative solution at time n — 1, the 
tentative solution at time n should be easy to compute. Moreover, if the time-varying system eventually 
reaches steady state, the algorithm should converge to the optimal steady state solution. In other words, 
instead of insisting that the adaptive algorithm solves © for every n, we only impose that it does so as 
n — > oo. 

Given these requirement, a natural candidate for such an algorithm is the following adaptation of the 
alternating minimization algorithm: start with an arbitrary initial Q G Qo\ for n > 1 compute (cf. ©) 

P n G arg mmD(P, Q n -i), 
p&v n 

Q n G arg min D(P n ,Q). 

QeQn 

'The conditions in [1] are actually slightly more general than the ones shown here and allow for functions D that take the value +00, 
i.e., D:Ixl -> R U {+00}. 
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Suppose that the sequences of sets {P n } n >o and {<2„}„>o converge (in a sense to be made precise later) 
to sets V and Q, respectively. We are interested in conditions under which 

lim D(P n ,Q n ) = min D(P,Q). 

n-+oo (P,Q)GVxQ 

As a main result of this paper, we provide a general set of sufficient conditions under which this adaptive 
algorithm converges. These conditions are essentially the same as those of [1] summarized in Theorem 
Q] The precise results are stated in Theorem |4j 

C. Organization 

The remainder of this paper is organized as follows. In Section UH we introduce notation, and some pre- 
liminary results. Section ITTT1 provides a convergence result for a fairly general class of adaptive alternating 
minimization algorithms. We specialize this result to adaptive minimization of divergences in Section ITVl 
and to adaptive minimization procedures in Hilbert spaces (with respect to inner product induced norm) in 
Section |V] This work was motivated by several applications in which the need for an adaptive alternating 
minimization algorithm arises. We present an application in the divergence minimization setting from 
statistics and finance in Section [IV] and an application in the Hilbert space setting from adaptive signal 
processing in Section |V] Section [VI] contains concluding remarks. 

II. Notations and Technical Preliminaries 

In this section, we setup notations and present technical preliminaries needed in the remainder of the 
paper. Let (Ad,d) be a compact metric space. Given two sets A, B C M, define the Hausdorff distance 
between them as 

dn(A, B) = max < sup inf d(A, B), sup inf d(A, B) 



It can be shown the d H is a metric, and in particular satisfies the triangle inequality. 

Consider a continuous function D : M x M — > K. For compact sets A, B C Ad, define the set 



With slight abuse of notation, let 



Q(A, B) = arg min D(A,B) 

(A,B)£AxB 



D(A,B)= min D(A,B) 

(A,B)eAxB 



Due to compactness of the sets A, B and continuity of D, we have Q(A, B) ^ 0, and hence D(A, B) is 
well-defined. 

A. Some Lemmas 

Here we state a few auxiliary lemmas used in the following. 

Lemma 2 ([1, Lemma 1]). Let {a n } n >o, {b n } n >o be sequences of real numbers, satisfying 

a n + K < 6 n _i + c 
for all n > 1 and some c£i 7/" lim sup n ^ OC) b n > — oo then 

lim inf a n < c. 

n— »oo 

If, in addition^, 

oo 

E( 

n=0 

2 We use (x) + = max{0, x}. 
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then 

lim a n = c. 

n— >oo 

Lemma 3. Let {A n } n >o be a sequence of subsets of A4. Let Abe a closed subset of M. such that A n -5 A. 

Consider any sequence {A n } n >o such that A n G A n for all n > 0, and such that A n — > A G M.. Then 
A e A. 

Proof: Since A n G A n and A n -5 A, the definition of Hausdorff distance implies that there exists 
a sequence {A n } n > such that A n <E A for all n and d(A n , A n ) — > as n — > oo. Therefore 

d(^n, A) < A n ) + d(A n , A) -> 

as n — > oo. Since the sequence {A n } n > is entirely in A, this implies that A is a limit point of A. As A 
is closed, we therefore have A e A. ■ 
Let (Af, d) be a metric space and / : X — > R. Define the modulus of continuity luj : R + — ► M + of / as 

= sup - f(x')\. 

x,x'eX: 
d(x,x')<t 

Remark 1. Note that if / is uniformly continuous then Wf(t) — > as £ — > 0. In particular, if (X, d) is 
compact and / is continuous then / is uniformly continuous, and hence \im t ^ wj(t) = 0. 

III. Adaptive Alternating Minimization Algorithms 

Here we present the precise problem formulation. We then present an adaptive algorithm and sufficient 
conditions for its convergence and correctness. 

A. Problem Statement 

Consider a compact metric space (JA,d), compact sets V,Q C Ai, and a continuous function D : 
M. x M. — > R. We want to find D(V, Q). However, we are not given the sets V, Q directly. Instead, we 
are given a sequence of compact sets {(V n , Q n )}n>o'- V n , Q n C M are revealed at time n such that as 
n — > oo, "P n -5 p and Q n -5 Q. Given an arbitrary initial (P , Q ) G "Po x Qo, the goal is to find a 
sequence of points (P n , Q n ) G V n x Q n such that 

lim D(P n ,g n ) =D(P J Q). 

n— >oo 

5. Algorithm 

The problem formulation described in the last section suggests the following adaptive version of the 
alternating minimization algorithm. Initially, we have (P , Qq) G Vq x Q - Recursively for n > 1, pick 
any 

P n G arg minD(P,(5 n _i), 
<5„ G arg minP(P n ,Q). 

We call this the Adaptive Alternating Minimization (AAM) algorithm in the sequel. Note that if V n = V 
and Q n = Q for all n, then the above algorithm specializes to the classical alternating minimization 
algorithm. 
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C. Sufficient Conditions for Convergence 

In this section, we present a set of sufficient conditions under which the AAM algorithm converges to 
D(V, Q). As we shall see, we need "three point" and "four point" properties (generalizing those in [1]) 
also in the adaptive setup. To this end, assume there exists a functionj 5 : A4 x M. — > R such that the 
following conditions are satisfied. 

(CI) Three point property (P, P, Q): for all n > 1, P G V n , Q G Q n -i, P G arg min D(P, Q), 

PeVn 

5(P,P) + D(P,Q)<D(P,Q). 
(C2) Four point property (P, Q, P, Q): for all n > 1, P, P G P n , Q e Q m Q 6 arg min P/(P, Q), 

d(p,q)<d(p,q) + 5(p,p). 

Our main result is as follows. 
Theorem 4. Let {(V n , Q n )}n>o ; P; Q be compact subsets of the compact metric space (Ai,d) such that 

V n *A V, Q n d A Q, 

and let D : M. x Ai — > M. be a continuous function. Let conditions CI and C2 hold. Then, under the 
AAM algorithm, 

]imw£ D{P n ,Q n ) = D(V,Q), 

n— >oo 

and all limit points of subsequences of {(P n , Q n )}n>o achieving this Urn inf belong to Q(V, Q). If, in 
addition, 

oo 

^u(2e n ) < oo, 

n=0 

where e n = rf_ff(P n ,P) + dn(Q n , Q)> and uj = uj d is the modulus of continuity of D, then 

lim D(P n ,Q n ) = D(V, Q), 

n— >oo 

and all limit points of {(P n , Q n )}n>o belong to G(V, Q). 

Remark 2. Compared to the conditions of [1, Theorem 2] summarized in Theorem [H the main additional 
requirement here is in essence uniform continuity of the function D (which is implied by compactness 
of M. and continuity of D), and summability of the cu(2e n ). This is the least one would expect in this 
adaptive setup to obtain a conclusion as in Theorem @J 

D. Proof of Theorem |?] 

We start with some preliminaries. Given that (A4, d) is compact, the product space (Ai x Ai, d 2 ) with 

d 2 ((A, B), (A', B')) 4 d(A, A') + d(B, B') 

for all (A, P), (A', B') G Mx M, is compact. Let uj : M + — > W + be the modulus of continuity of D with 
respect to the metric space (M x M, d 2 ). By definition of u>, for any e > and (A, P), (A', B') G MxM 
such that 

d 2 ((A,B),(A',B'))<e, 

we have 

\D(A,B)-D(A',B')\ < u(e). 

3 Note that unlike the condition in [1], we do not require 5 to be nonnegative here. 
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Moreover, continuity of D and compactness of Ai x Ai imply (see Remark [B that oj(s) — > as £ — > 0. 
Recall the definition of 

e n = d H (Vn,V) + d H (Q n ,Q). 
By the hypothesis of Theorem HI we have e n — > as n — > oo, and 

d H (V n , V n -l) + d H (Qn, Qn-l) < £n-l + £« - In, 



with 7„ — > as n — > oo. 

We now proceed to the proof of Theorem @J Condition CI implies that for all n > 1, P E V n ,Q E Q n , 

6{P, P n ) + D(P n , Q n _ x ) < D(P, Q n _i). (4) 
Condition C2 implies that for all n > 1, P E V n ,Q E Q n , 

D(P,Q n )<D(P,Q)+6(P,P n ). (5) 
Adding © and ©, we obtain that for all n > 1, P E V n , Q E Q n , 

D(P n , Qn-!) + D(P, Q n ) < D(P, Qn-x) + D(P, Q). (6) 
Given that cZff(Q re _i, Q n ) < In, there exists G Q n such d(Q n -i,Q n ) < j n . It follows that 

da((-Pn, Qn), (Pn, Qn-l)) < 7n, 

and hence 

|D(P n ,Q n )-£0Pn,Q„-i)| <w(7n)- (7) 
From C7J) and the AAM algorithm, we have 

D(P n ,Q n ) = min D{P n ,Q) 

Q^zQn 

<D(P n ,Q n ) (since Q„ G Q n ) ( 8 ) 

< D(P n ,Qn-i)+w(7„). 

Adding inequalities © and ©, 

P(P n , Q n ) + D(P, Q n ) < D{P, g n _0 + D(P, Q) + u;( 7 „), (9) 
for all P G V n , Q E Q n . 

Since V n ^ V and Q n ^ Q, there exists a sequence (P*,Q* n ) E V n x Q n such that (P*,Q* n ) -> 
(P*, Q*) G £(P, Q) and d 2 ((P n *, Q;), (P*, Q*)) < e n for all n > 0. Pick any such sequence {(P n *, Q* n )} n > . 
Replacing (P,Q) in © by this (P*,Q*), we obtain 

D(P n , Q n ) + P(P„*, Q n ) < D(P* n , Q n _ x ) + D(P*, Ql) + u(j n ). (10) 

By choice of the (P*,Q* n ), 

D(P:,Q n )<D(P\Q*)+uj(e n ). (11) 

Moreover, 

d{p:_ x , p* n ) < d^:^ p*) + d{p\ p:) 

In, 

and therefore 

P(P ? t, Qn-l) < D(P*_,, Q n _ x ) + w( 7n ). (12) 
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Combining inequalities (fTTT) and (fT2l) with (TTOb , we obtain 

D{P n , Q n ) + L>(P n *, Q„) < D(P*_ ls g n _i) + P>(P*, Q*) + 2u;( 7n ) + w(e n ). (13) 

Define 

a n = P(P n , Q n ) - 2w( 7n ) - 
b n ±D(P*,Q n ), 
c±D(P*,Q*), 

and note that by (fl~3l 

a n + in < + C 

Since D is a continuous function over the compact set .M x .M, it is also a bounded function. Hence we 
have limsup n ^ 00 \b n \ < oo. Applying Lemma [2l 

liminf D(P n ,Q n ) < D(P*,Q*) + limsup (2cu( 7n ) + u(e n )) . (14) 

71 >0 ° n — >oo 

Since 7n — > and e n — > imply 2u ( 7n ) + o;(e n ) — > 0, (TT4l) yields 

liminfP(P n ,Q n ) <D(V,Q). (15) 

Now, let {nfc}fc> be a subsequence such that 

liminf D(P n ,Q n ) = lim D(P nk ,Q nk ). 

n — >oo — >oo 

By compactness of A4 x we can assume without loss of generality that P„ fc P, <5n fc — ► Q for some 
P,QeA4. Since P and Q are compact, Lemma [3] shows that P E V, Q E Q. By continuity of D this 
implies that 

liminf D(P n ,Q n ) = lim D(P nk ,Q nk ) 

n— >oo fc— >oo 

= D(P,Q) 
> D(V, Q). 

Together with (fl~5l) . this shows that 

hminfP>(P n ,Q n ) = P>(P, Q), 

n— >oo 

and that all limit points of subsequences of {(Pi, Q n )}n>o achieving this lim inf belong to G(V, Q). This 
completes the proof the first part of Theorem |4] 
Suppose now that we have in addition 

oo 

J2^n)<00. (16) 

n=0 

Since 

D(P n ,Q n )> mm D(P,Q) 

> mm D(P,Q)-u(e n ) 

PeV,Q&Q 

= D(P*,Q*)-uj(e n ), 



s 



we have 



(c - a n ) + = (D(P*, Q*) - D(P n , Q n ) + 2u( ln ) + uo{e n ))' 

< 2{u( ln ) +uj(e n )) 

< 2{uj(2e n ) + u{2e n _ x ) + u{e n )) 

< 2(2u{2s n )+uj(2s n ^)). 



Thus by ([16 



y^(c - a n ) + < oo, 



n=0 



and applying again Lemma [2] yields 

lim D(P n ,Q n )=D(P*,Q*). (17) 

n— »oo 

As every limit point of {(P n , Q n )}n>o belongs to V x Q by Lemma [3l (fT7l) and continuity of £) imply 
that if ( IT6T ) holds then every limit point of {(P n , Qn)}n>o must also belong to Q{V, Q). This concludes 
the proof of Theorem HI 

IV. Divergence Minimization 

In this section, we specialize the algorithm from Section [HI] to the case of alternating divergence 
minimization. A large class of problems can be formulated as a minimization of divergences. For example, 
computation of channel capacity and rate distortion function [2], [3], selection of log-optimal portfo- 
lios [4], and maximum likelihood estimation from incomplete data [5]. These problems were shown to be 
divergence minimization problems in [1]. For further applications of alternating divergence minimization 
algorithms, see [6]. We describe applications to the problem of adaptive mixture decomposition and of 
adaptive log-optimal portfolio selection. 

A. Setting 

Given a finite set E and some constant < b < B, let M. = -M(E, b, B) be the set of all measures P 
on E such that 

J2P(v) < B, and P(a) > b, V a e E. (18) 

o-es 

Endow M. with the topology induced by the metric d : M. x M. — > M + defined as 

d(P,Q) =max\P(a)-Q(a)\. 

It is easy to check that the metric space (M, d) is compact. The cost function D of interest is divergence^ 

P(<r) 



D(PQ)±D(P\\Q)±J2 P ^ lo Z 



for any P,QeA4. Note that ([TBI ensures that D is well defined (i.e., does not take the value oo). It is 
well-known (and easy to check) that the function D is continuous and convex in both arguments. Finally, 
define the function 5 : M x M — ► R 

5(P,P)±D(P\\P)-Y,(P(<r)-P(<r) 

In [1], it has been established that for convex V and Q the pair of functions D, 6 satisfy the "three 
point" and "four point" properties CI and C2. As stated above, the space M. = A^(E, b, B) with metric 
d is a compact metric space, and the function D is continuous. Hence Theorem 0] applies in this setting. 

4 A11 logarithms are with respect to base e. 
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B. Application: Decomposition of Mixtures and Log-Optimal Portfolio Selection 

We consider an application of our adaptive divergence minimization algorithm to the problem of 
decomposing a mixture. A special case of this setting yields the problem of log-optimal portfolio selection. 

We are given a sequence of i.i.d. random variables {Y/} ; > , each taking values in the finite set 3^- 
Yi is distributed according to the mixture J2i=i Cifa* where the {c;}f =1 sum to one, Q > c > for 
all % G {1, . . . ,/}, and where {/i;}f =1 are distributions on y. We assume that Hi(y) > > for all 
y G y,i G {1, . . . , I}. The goal is to compute an estimate of {Q}f =1 from {V/}™ =1 and knowing {/ij}f =1 . 

Let P n : y [0, 1], 

— 1 n 

be the empirical distribution of {Yi}™ = i- The maximum likelihood estimator of {cj}f =1 is given by (see, 
e.g., [7, Lemma 3.1]) 

arg minD[P n \\Y! i=1 Ci^i ), (19) 
{5i} V J 

Following [7, Example 5.1], we define 

E±{i,...,i}xy, 

Qn = Q = {Q ■ Q{i, y) = CifJ>i{y), for some {£i} with Y,A = 1, h > c Vz}, (20) 

v n = {p ■. ELi p (hy)^Pn(y),P(i,y) > m, y }. 

Note that V n and Q are convex and compact. From [7, Lemma 5.1], we have 

minZ)(PJ|]>][-iCi/!i ) = min mmD(P\\Q), 

and the minimizer of the left hand side (and hence (fT9l) ) is recovered from the corresponding marginal 
of the optimal Q on the right hand side. 

We now show how the projections on the sets V n and Q can be computed. Fix a P, assuming without 
loss of generality that 

p(i, y)>J2 p ( 2 > v) > ■ ■ ■ > £ p ( J - v)- 

yey yey yey 

We want to minimize D(P\\Q) over all Q G Q, or, equivalently, over all valid {q}. The {q} minimizing 
D(P\\Q) can be shown to be of the form q > c for all i < J* and q = c for all i > J*. More precisely, 
define 



and choose J* G {1, . . . , 1} such that 



^£ P M) >C ° for 1 < i < J*, 



Then the optimal {cj} are given by 



^E^,2/)<Co for J*<*</. 



^ = ^E P ( Z ^) for 1 < i < J*, 

/l j yey 

di = c for J* < i < I. 
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For fixed Q(i, y) = Cifj,i(y), the minimizing P is 

P(i,v) = J i * {v ] s Pn{y). (21) 

We now check that (fT8~l) is satisfied for some values of b and B. As V n and Q are sets of distributions, 

we can choose B = 1. For all Q G Q, i e {1 /}, j/ 6 X we have Q(i,y) > [iqCq > 0. However, 

for P G P n , we have in general only P(i,y) > 0. In order to apply the results from Section IIV-AI we 
need to show that we can, without loss of optimality, restrict the sets V n to contain only distributions P 
that are bounded below by some p > 0. In other words, we need to show that the projections on V n are 
bounded below by p . 

Assume for the moment that the empirical distribution P n is close to the true one in the sense that 



for all y G y. As J2i c i^i(y) > A*o this implies P n (y) > ^ for all y. From (12TI) . this implies that the 
projection P in V n of any point in Q satisfies P(i, y) > |c /Xo — Po for all z £ {1, ... , I}, y G y. Hence 
in this case -M(E, b, B) satisfies (fl"8~l) with b = ^co/ig and B = 1. 

It remains to argue that P n is close to ^ Ci^y). Suppose instead of constructing the set P n (see (T20l) ) 
with respect to P n , we construct it with respect to the distribution P n defined as 



where A is chosen such that ^ y P n {y) = 1. P n is bounded below by ^ by construction. Moreover, by 
the strong law of large numbers, 

P(P n ^ ^ n i.o.) = 0. 

Hence we have V n -5 V almost surely, where V is constructed as in (f20l) with respect to the true 
distribution J^i^^i- 

Applying now the results from Section IIV-AI and Theorem |4] yields that under the AAM algorithm 

HminfP>(P n) Q n ) = P>(P, Q) 

n— >oo 

almost surely, and that every limit point of {(P n , Q n )}n>o achieving this lim inf is an element of Q(V, Q). 

Since by the law of the iterated logarithm, convergence of P n to P is only 6(v / log log nj y/n) as 
n — > oo almost surely, and since \im £ ^ Q u(e)/e = only if D is a constant [8], we can in this scenario 
not conclude from Theorem @] that lim^oo D(P n , Q n ) = D(V, Q). 

As noted in [7], a special case of the decomposition of mixture problem is that of maximizing the 
expected value of log^CiVFj, where {Wj}f =1 is distributed according to P n . The standard alternating 
divergence minimization algorithm is then the same as Cover's portfolio optimization algorithm [4]. 
Thus the AAM algorithm applied as before yields also an adaptive version of this portfolio optimization 
algorithm. 



V. Projections in Hilbert Space 

In this section, we specialize the algorithm from Section [III] to the case of minimization in a Hilbert 
space. A large class of problems can be formulated as alternating projections in Hilbert spaces. For 
example, problems in filter design, signal recovery, and spectral estimation. For an extensive overview, 
see [9]. In the context of Hilbert spaces, the alternating minimization algorithm is often called POCS 
(Projection Onto Convex Sets). 
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A. Setting 

Let M. be a compact subset of a Hilbert space with the usual norm d(A, B) 2 = (A — B, A — B). Then 
(/A, d) is a compact metric space. The cost function D of interest is 

D(A,B) = d(A, B) 2 . 

The function D is continuous and convex. Define the function 5 (as part of conditions CI and C2), as 

6{A,A) = d{A,A) 2 . 

In [1], it is established that for convex V and Q the pair of functions D,5 satisfies the "three point" 
and "four point" properties CI and C2. Hence Theorem |4] applies in this setting. 

B. Application: Set Theoretic Signal Processing and Adaptive Filter Design 

In this section, we consider a problem in the Hilbert space setting as defined in Section IV-AI Let 
{Si}j =1 be a collection of convex compact subsets of the Hilbert space M fe with the usual inner product, 
and let {cj}f =1 be positive weights summing to one. In set-theoretic signal processing, the objective is to 
find a point A minimizing 

i 

J2 c AAS l ), (22) 

i=i 

where d(A, Si) = min^s. d(A, S). Many problems in signal processing can be formulated in this way. 
Applications can be found for example in control, filter design, and estimation. For an overview and 
extensive list of references, see [9]. As an example, in a filter design problem, the Si could be constraints 
on the impulse and frequency responses of a filter [10], [11]. 

Following [12], this problem can be formulated in our framework by defining the Hilbert space H = IR /fc 
with inner product 

i=l 

where A*, B; L E M. k for i E {1, . . . , 7} are the components of A and B. Let 

S = conv{uf =1 Si} C R h , 
be the convex hull of the union of the constraint sets {Si}( =1 , and let 

M = S 1 C n 

be its /-fold product. Since each of the sets Si is compact, M. is compact and by definition also convex. 
We define the set V C M as 

V = {(P,...,P) eH-.PeS} 

and the set Q C M as 

Q = 5ix---x5/. (23) 

We now show how the projections on the sets V and Q can be computed. For a fixed P = (P, . . . , P) E 
V, the Q E Q minimizing D(P,Q) has the form 

(S 1 (P),...,S I (P)), 

where Si(P) is the Q, L E S { minimizing \\P - Qi\\ 2 . For a fixed Q = (Q u ...,Qi) E Q the P E V 
minimizing D(P,Q) is given by 
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Moreover, a solution to (1221) can be found from the standard alternating minimization algorithm for Hilbert 
spaces on V and Q. 

To this point, we have assumed that the constraint sets {«Sj}f =1 are constant. The results from Section Hill 
enable us to look at situations in which the constraint sets {iS>i >n }f =1 are time-varying. Returning to the 
filter design example mentioned above, we are now interested in an adaptive filter. The need for such 
filters arises in many different situations (see, e.g., [13]). 

The time-varying sets {«Sj i7l }f =1 give rise to sets Q n , defined in analogy to (|23l . We assume again that 

Si in -5 Si for all i £ {1, . . . , J}, and let Q be defined with respect to the limiting {Si}j =1 as before. 
Applying the results from Section IV-AI and Theorem HI we obtain convergence and correctness of the 
A AM algorithm. 

VI. Conclusions 

We considered a fairly general adaptive alternating minimization algorithm, and found sufficient condi- 
tions for its convergence and correctness. This adaptive algorithm has applications in a variety of settings. 
We discussed in detail how to apply it to three different problems (from statistics, finance, and signal 
processing). 
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